171 research outputs found
Nearly optimal Bayesian Shrinkage for High Dimensional Regression
During the past decade, shrinkage priors have received much attention in
Bayesian analysis of high-dimensional data. In this paper, we study the problem
for high-dimensional linear regression models. We show that if the shrinkage
prior has a heavy and flat tail, and allocates a sufficiently large probability
mass in a very small neighborhood of zero, then its posterior properties are as
good as those of the spike-and-slab prior. While enjoying its efficiency in
Bayesian computation, the shrinkage prior can lead to a nearly optimal
contraction rate and selection consistency as the spike-and-slab prior. Our
numerical results show that under posterior consistency, Bayesian methods can
yield much better results in variable selection than the regularization
methods, such as Lasso and SCAD. We also establish a Bernstein von-Mises type
results comparable to Castillo et al (2015), this result leads to a convenient
way to quantify uncertainties of the regression coefficient estimates, which
has been beyond the ability of regularization methods
Improving SAMC using smoothing methods: Theory and applications to Bayesian model selection problems
Stochastic approximation Monte Carlo (SAMC) has recently been proposed by
Liang, Liu and Carroll [J. Amer. Statist. Assoc. 102 (2007) 305--320] as a
general simulation and optimization algorithm. In this paper, we propose to
improve its convergence using smoothing methods and discuss the application of
the new algorithm to Bayesian model selection problems. The new algorithm is
tested through a change-point identification example. The numerical results
indicate that the new algorithm can outperform SAMC and reversible jump MCMC
significantly for the model selection problems. The new algorithm represents a
general form of the stochastic approximation Markov chain Monte Carlo
algorithm. It allows multiple samples to be generated at each iteration, and a
bias term to be included in the parameter updating step. A rigorous proof for
the convergence of the general algorithm is established under verifiable
conditions. This paper also provides a framework on how to improve efficiency
of Monte Carlo simulations by incorporating some nonparametric techniques.Comment: Published in at http://dx.doi.org/10.1214/07-AOS577 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Double Regression Method for Graphical Modeling of High-dimensional Nonlinear and Non-Gaussian Data
Graphical models have long been studied in statistics as a tool for inferring
conditional independence relationships among a large set of random variables.
The most existing works in graphical modeling focus on the cases that the data
are Gaussian or mixed and the variables are linearly dependent. In this paper,
we propose a double regression method for learning graphical models under the
high-dimensional nonlinear and non-Gaussian setting, and prove that the
proposed method is consistent under mild conditions. The proposed method works
by performing a series of nonparametric conditional independence tests. The
conditioning set of each test is reduced via a double regression procedure
where a model-free sure independence screening procedure or a sparse deep
neural network can be employed. The numerical results indicate that the
proposed method works well for high-dimensional nonlinear and non-Gaussian
data.Comment: 1 figur
Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network
Sufficient dimension reduction is a powerful tool to extract core information
hidden in the high-dimensional data and has potentially many important
applications in machine learning tasks. However, the existing nonlinear
sufficient dimension reduction methods often lack the scalability necessary for
dealing with large-scale data. We propose a new type of stochastic neural
network under a rigorous probabilistic framework and show that it can be used
for sufficient dimension reduction for large-scale data. The proposed
stochastic neural network is trained using an adaptive stochastic gradient
Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in
the paper as well. Through extensive experiments on real-world classification
and regression problems, we show that the proposed method compares favorably
with the existing state-of-the-art sufficient dimension reduction methods and
is computationally more efficient for large-scale data
Bayesian Peak Picking for NMR Spectra
AbstractProtein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method
- …